class U_I18N_API UnicodeSet

A mutable set of Unicode characters

Public Methods

const UnicodeString& getPairs() const
Return the representation of this set as a list of character ranges
UnicodeSet()
Constructs an empty set
UnicodeSet(const UnicodeString& pattern, UErrorCode& status)
Constructs a set from the given pattern
UnicodeSet(const UnicodeString& pattern, bool_t ignoreSpaces, UErrorCode& status)
Constructs a set from the given pattern, optionally ignoring white space
UnicodeSet(int8_t category, UErrorCode& status)
Constructs a set from the given Unicode character category
UnicodeSet(const UnicodeSet& o)
Constructs a set that is identical to the given UnicodeSet
virtual ~UnicodeSet()
Destructs the set
UnicodeSet& operator=(const UnicodeSet& o)
Assigns this object to be a copy of another
virtual bool_t operator==(const UnicodeSet& o) const
Compares the specified object with this set for equality
bool_t operator!=(const UnicodeSet& o) const
Compares the specified object with this set for equality
virtual int32_t hashCode() const
Returns the hash code value for this set
virtual void applyPattern(const UnicodeString& pattern, bool_t ignoreSpaces, UErrorCode& status)
Modifies this set to represent the set specified by the given pattern, optionally ignoring white space
void applyPattern(const UnicodeString& pattern, UErrorCode& status)
Modifies this set to represent the set specified by the given pattern
virtual UnicodeString& toPattern(UnicodeString& result) const
Returns a string representation of this set
virtual int32_t size() const
Returns the number of elements in this set (its cardinality), n, where 0 <= n <= 65536
virtual bool_t isEmpty() const
Returns true if this set contains no elements
virtual bool_t contains(UChar first, UChar last) const
Returns true if this set contains the specified range of chars
virtual bool_t contains(UChar c) const
Returns true if this set contains the specified char
virtual void add(UChar first, UChar last)
Adds the specified range to this set if it is not already present
virtual void add(UChar c)
Adds the specified character to this set if it is not already present
virtual void remove(UChar first, UChar last)
Removes the specified range from this set if it is present
virtual void remove(UChar c)
Removes the specified character from this set if it is present
virtual bool_t containsAll(const UnicodeSet& c) const
Returns true if the specified set is a subset of this set
virtual void addAll(const UnicodeSet& c)
Adds all of the elements in the specified set to this set if they're not already present
virtual void retainAll(const UnicodeSet& c)
Retains only the elements in this set that are contained in the specified set
virtual void removeAll(const UnicodeSet& c)
Removes from this set all of its elements that are contained in the specified set
virtual void complement()
Inverts this set
virtual void clear()
Removes all of the elements from this set

Documentation

A mutable set of Unicode characters. Objects of this class represent character classes used in regular expressions. Such classes specify a subset of the set of all Unicode characters, which in this implementation is the characters from U+0000 to U+FFFF, ignoring surrogates.

This class supports two APIs. The first is modeled after Java 2's java.util.Set interface, although this class does not implement that interface. All methods of Set are supported, with the modification that they take a character range or single character instead of an Object, and they take a UnicodeSet instead of a Collection.

The second API is the applyPattern()/toPattern() API from the Format-derived classes. Unlike the methods that add characters, add categories, and control the logic of the set, the method applyPattern() sets all attributes of a UnicodeSet at once, based on a string pattern.

In addition, the set complement operation is supported through the complement() method.

Pattern syntax

Patterns are accepted by the constructors and the applyPattern() methods and returned by the toPattern() method. These patterns follow a syntax similar to that employed by version 8 regular expression character classes:
pattern := ('[' '^'? item* ']') | ('[:' '^'? category ':]')
item := char | (char '-' char) | pattern-expr
pattern-expr := pattern | pattern-expr pattern | pattern-expr op pattern
op := '&' | '-'
special := '[' | ']' | '-'
char := any character that is not special | ('\' any character) | ('\\u' hex hex hex hex)
hex := any hex digit, as defined by Character.digit(c, 16)

Legend:
a:=b a may be replaced by b
a? zero or one instance of a
a* one or more instances of a
a|b either a or b
'a' the literal string between the quotes
Patterns specify individual characters, ranges of characters, and Unicode character categories. When elements are concatenated, they specify their union. To complement a set, place a '^' immediately after the opening '[' or '[:'. In any other location, '^' has no special meaning.

Ranges are indicated by placing two a '-' between two characters, as in "a-z". This specifies the range of all characters from the left to the right, in Unicode order. If the left and right characters are the same, then the range consists of just that character. If the left character is greater than the right character it is a syntax error. If a '-' occurs as the first character after the opening '[' or '[^', or if it occurs as the last character before the closing ']', then it is taken as a literal. Thus "[a\-b]", "[-ab]", and "[ab-]" all indicate the same set of three characters, 'a', 'b', and '-'.

Sets may be intersected using the '&' operator or the asymmetric set difference may be taken using the '-' operator, for example, "[[:L:]&[\u0000-\u0FFF]]" indicates the set of all Unicode letters with values less than 4096. Operators ('&' and '|') have equal precedence and bind left-to-right. Thus "[[:L:]-[a-z]-[\u0100-\u01FF]]" is equivalent to "[[[:L:]-[a-z]]-[\u0100-\u01FF]]". This only really matters for difference; intersection is commutative.
[a]The set containing 'a'
[a-z]The set containing 'a' through 'z' and all letters in between, in Unicode order
[^a-z]The set containing all characters but 'a' through 'z', that is, U+0000 through 'a'-1 and 'z'+1 through U+FFFF
[[pat1][pat2]] The union of sets specified by pat1 and pat2
[[pat1]&[pat2]] The intersection of sets specified by pat1 and pat2
[[pat1]-[pat2]] The asymmetric difference of sets specified by pat1 and pat2
[:Lu:] The set of characters belonging to the given Unicode category, as defined by Character.getType(); in this case, Unicode uppercase letters
[:L:] The set of characters belonging to all Unicode categories starting wih 'L', that is, [[:Lu:][:Ll:][:Lt:][:Lm:][:Lo:]].

Character categories. Character categories are specified using the POSIX-like syntax '[:Lu:]'. The complement of a category is specified by inserting '^' after the opening '[:'. The following category names are recognized. Actual determination of category data uses Unicode::getType(), so it reflects the underlying data used by Unicode.

Normative
Mn = Mark, Non-Spacing
Mc = Mark, Spacing Combining
Me = Mark, Enclosing

Nd = Number, Decimal Digit
Nl = Number, Letter
No = Number, Other

Zs = Separator, Space
Zl = Separator, Line
Zp = Separator, Paragraph

Cc = Other, Control
Cf = Other, Format
Cs = Other, Surrogate
Co = Other, Private Use
Cn = Other, Not Assigned

Informative
Lu = Letter, Uppercase
Ll = Letter, Lowercase
Lt = Letter, Titlecase
Lm = Letter, Modifier
Lo = Letter, Other

Pc = Punctuation, Connector
Pd = Punctuation, Dash
Ps = Punctuation, Open
Pe = Punctuation, Close
Pi = Punctuation, Initial quote
Pf = Punctuation, Final quote
Po = Punctuation, Other

Sm = Symbol, Math
Sc = Symbol, Currency
Sk = Symbol, Modifier
So = Symbol, Other
const UnicodeString& getPairs() const
Return the representation of this set as a list of character ranges. Ranges are listed in ascending Unicode order. For example, the set [a-zA-M3] is represented as "33AMaz".

UnicodeSet()
Constructs an empty set

UnicodeSet(const UnicodeString& pattern, UErrorCode& status)
Constructs a set from the given pattern. See the class description for the syntax of the pattern language.
Throws:
IllegalArgumentException if the pattern contains a syntax error.
Parameters:
pattern - a string specifying what characters are in the set

UnicodeSet(const UnicodeString& pattern, bool_t ignoreSpaces, UErrorCode& status)
Constructs a set from the given pattern, optionally ignoring white space. See the class description for the syntax of the pattern language.
Throws:
IllegalArgumentException if the pattern contains a syntax error.
Parameters:
pattern - a string specifying what characters are in the set
ignoreSpaces - if true, all spaces in the pattern are ignored, except those preceded by '\\'. Spaces are those characters for which Character.isSpaceChar() is true.

UnicodeSet(int8_t category, UErrorCode& status)
Constructs a set from the given Unicode character category
Throws:
IllegalArgumentException if the given category is invalid.
Parameters:
category - an integer indicating the character category as returned by Character.getType().

UnicodeSet(const UnicodeSet& o)
Constructs a set that is identical to the given UnicodeSet

virtual ~UnicodeSet()
Destructs the set

UnicodeSet& operator=(const UnicodeSet& o)
Assigns this object to be a copy of another

virtual bool_t operator==(const UnicodeSet& o) const
Compares the specified object with this set for equality. Returns true if the two sets have the same size, and every member of the specified set is contained in this set (or equivalently, every member of this set is contained in the specified set).
Returns:
true if the specified set is equal to this set.
Parameters:
o - set to be compared for equality with this set.

bool_t operator!=(const UnicodeSet& o) const
Compares the specified object with this set for equality. Returns true if the specified set is not equal to this set.

virtual int32_t hashCode() const
Returns the hash code value for this set.
Returns:
the hash code value for this set.
See Also:
hashCode()

virtual void applyPattern(const UnicodeString& pattern, bool_t ignoreSpaces, UErrorCode& status)
Modifies this set to represent the set specified by the given pattern, optionally ignoring white space. See the class description for the syntax of the pattern language.
Throws:
IllegalArgumentException if the pattern contains a syntax error.
Parameters:
pattern - a string specifying what characters are in the set
ignoreSpaces - if true, all spaces in the pattern are ignored. Spaces are those characters for which Character.isSpaceChar() is true. Characters preceded by '\\' are escaped, losing any special meaning they otherwise have. Spaces may be included by escaping them.

void applyPattern(const UnicodeString& pattern, UErrorCode& status)
Modifies this set to represent the set specified by the given pattern. See the class description for the syntax of the pattern language.
Throws:
IllegalArgumentException if the pattern contains a syntax error.
Parameters:
pattern - a string specifying what characters are in the set

virtual UnicodeString& toPattern(UnicodeString& result) const
Returns a string representation of this set. If the result of calling this function is passed to a UnicodeSet constructor, it will produce another set that is equal to this one.

virtual int32_t size() const
Returns the number of elements in this set (its cardinality), n, where 0 <= n <= 65536.
Returns:
the number of elements in this set (its cardinality).

virtual bool_t isEmpty() const
Returns true if this set contains no elements.
Returns:
true if this set contains no elements.

virtual bool_t contains(UChar first, UChar last) const
Returns true if this set contains the specified range of chars.
Returns:
true if this set contains the specified range of chars.

virtual bool_t contains(UChar c) const
Returns true if this set contains the specified char.
Returns:
true if this set contains the specified char.

virtual void add(UChar first, UChar last)
Adds the specified range to this set if it is not already present. If this set already contains the specified range, the call leaves this set unchanged. If last > first then an empty range is added, leaving the set unchanged.
Parameters:
first - first character, inclusive, of range to be added to this set.
last - last character, inclusive, of range to be added to this set.

virtual void add(UChar c)
Adds the specified character to this set if it is not already present. If this set already contains the specified character, the call leaves this set unchanged.

virtual void remove(UChar first, UChar last)
Removes the specified range from this set if it is present. The set will not contain the specified range once the call returns. If last > first then an empty range is removed, leaving the set unchanged.
Parameters:
first - first character, inclusive, of range to be removed from this set.
last - last character, inclusive, of range to be removed from this set.

virtual void remove(UChar c)
Removes the specified character from this set if it is present. The set will not contain the specified range once the call returns.

virtual bool_t containsAll(const UnicodeSet& c) const
Returns true if the specified set is a subset of this set.
Returns:
true if this set contains all of the elements of the specified set.
Parameters:
c - set to be checked for containment in this set.

virtual void addAll(const UnicodeSet& c)
Adds all of the elements in the specified set to this set if they're not already present. This operation effectively modifies this set so that its value is the union of the two sets. The behavior of this operation is unspecified if the specified collection is modified while the operation is in progress.
Parameters:
c - set whose elements are to be added to this set.
See Also:
add(char, char)

virtual void retainAll(const UnicodeSet& c)
Retains only the elements in this set that are contained in the specified set. In other words, removes from this set all of its elements that are not contained in the specified set. This operation effectively modifies this set so that its value is the intersection of the two sets.
Parameters:
c - set that defines which elements this set will retain.

virtual void removeAll(const UnicodeSet& c)
Removes from this set all of its elements that are contained in the specified set. This operation effectively modifies this set so that its value is the asymmetric set difference of the two sets.
Parameters:
c - set that defines which elements will be removed from this set.

virtual void complement()
Inverts this set. This operation modifies this set so that its value is its complement. This is equivalent to the pseudo code: this = new CharSet("[\u0000-\uFFFF]").removeAll(this).

virtual void clear()
Removes all of the elements from this set. This set will be empty after this call returns.


This class has no child classes.
Author:
Alan Liu

alphabetic index hierarchy of classes


this page has been generated automatically by doc++

(c)opyright by Malte Zöckler, Roland Wunderling
contact: doc++@zib.de